Listen Top Shows Blog

[Article Voiceover] Llama 3.2 Vision and Molmo: Foundations for the multimodal open-source ecosystem

[Article Voiceover] Llama 3.2 Vision and Molmo: Foundations for the multimodal open-source ecosystem

Update: 2024-09-27

Share

Description

Sorry this one was late! Thanks for bearing with me, and keep sending feedback my way. Still a year or two away from when I have time to record these, but I would love to.

Open-source tools, examples, limits, and the state of training multimodal models.
This is AI generated audio with Python and 11Labs.
Source code: https://github.com/natolambert/interconnects-tools
Original post: https://www.interconnects.ai/p/molmo-and-llama-3-vision

00:00 Llama 3.2 Vision and Molmo: Foundations for the multimodal open-source ecosystem
02:47 Llama vision: Multimodality for the masses of developers
03:27 Molmo: a (mostly) open-source equivalent to Llama vision
08:45 How adding vision changes capabilities and reasoning
11:47 Multimodal language models: Earlier on the exponential

Fig 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_013.png
Fig 2: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_015.png
Fig 3: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_021.png
Fig 4: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_023.png
Fig 5: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_027.png
Fig 6: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_030.png
Fig 7: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_037.png
Fig 8: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_046.png
Fig 9: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_048.png
Fig 10: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_050.png
Fig 11: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_052.png
Fig 12: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_054.png
Fig 13: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_058.png
Fig 14: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_065.png

Get full access to Interconnects at www.interconnects.ai/subscribe

Comments

Top Podcasts

The Best New Comedy Podcast Right Now – June 2024 The Best News Podcast Right Now – June 2024 The Best New Business Podcast Right Now – June 2024 The Best New Sports Podcast Right Now – June 2024 The Best New True Crime Podcast Right Now – June 2024 The Best New Joe Rogan Experience Podcast Right Now – June 20 The Best New Dan Bongino Show Podcast Right Now – June 20 The Best New Mark Levin Podcast – June 2024

In Channel

(Voiceover) Tülu 3: The next era in open post-training

(Voiceover) Tülu 3: The next era in open post-training

2024-11-2107:59

(Voiceover) Scaling realities

(Voiceover) Scaling realities

2024-11-1404:21

(Voiceover) Saving the National AI Research Resource & my AI policy outlook

(Voiceover) Saving the National AI Research Resource & my AI policy outlook

2024-11-1311:22

Interviewing Tim Dettmers on open-source AI: Agents, scaling, quantization and what's next

Interviewing Tim Dettmers on open-source AI: Agents, scaling, quantization and what's next

2024-11-0701:15:45

Interviewing Andrew Carr of Cartwheel on the State of Generative AI

Interviewing Andrew Carr of Cartwheel on the State of Generative AI

2024-10-3154:10

(Voiceover) Why I build open language models

(Voiceover) Why I build open language models

2024-10-3010:19

(Voiceover) Claude's agentic future and the current state of the frontier models

(Voiceover) Claude's agentic future and the current state of the frontier models

2024-10-2311:23

Interviewing Arvind Narayanan on making sense of AI hype

Interviewing Arvind Narayanan on making sense of AI hype

2024-10-1754:21

(Voiceover) Building on evaluation quicksand

(Voiceover) Building on evaluation quicksand

2024-10-1616:36

Interviewing Andrew Trask on how language models should store (and access) information

Interviewing Andrew Trask on how language models should store (and access) information

2024-10-1001:00:12

How scaling changes model behavior

How scaling changes model behavior

2024-10-0911:47

[Article Voiceover] AI Safety's Crux: Culture vs. Capitalism

[Article Voiceover] AI Safety's Crux: Culture vs. Capitalism

2024-10-0210:29

Interviewing Riley Goodside on the science of prompting

Interviewing Riley Goodside on the science of prompting

2024-09-3001:08:39

[Article Voiceover] Llama 3.2 Vision and Molmo: Foundations for the multimodal open-source ecosystem

[Article Voiceover] Llama 3.2 Vision and Molmo: Foundations for the multimodal open-source ecosystem

2024-09-2714:04

[Article Voiceover] Reverse engineering OpenAI's o1

[Article Voiceover] Reverse engineering OpenAI's o1

2024-09-1718:51

Futures of the data foundry business model

Futures of the data foundry business model

2024-09-1111:31

A post-training approach to AI regulation with Model Specs

A post-training approach to AI regulation with Model Specs

2024-09-1005:38

OpenAI's Strawberry, LM self-talk, inference scaling laws, and spending more on inference

OpenAI's Strawberry, LM self-talk, inference scaling laws, and spending more on inference

2024-09-0510:40

OLMoE and the hidden simplicity in training better foundation models

OLMoE and the hidden simplicity in training better foundation models

2024-09-0410:31

On the current definitions of open-source AI and the state of the data commons

On the current definitions of open-source AI and the state of the data commons

2024-08-2808:00

00:00

00:00

x

[Article Voiceover] Llama 3.2 Vision and Molmo: Foundations for the multimodal open-source ecosystem

[Article Voiceover] Llama 3.2 Vision and Molmo: Foundations for the multimodal open-source ecosystem

Nathan Lambert